Causal Reinforcement Learning: An Instrumental Variable Approach

نویسندگان

چکیده

In the standard data analysis framework, is first collected (once for all), and then carried out. Moreover, data-generating process typically assumed to be exogenous. This approach natural when analyst has no impact on how generated. The advancement of digital technology, however, facilitated firms learn from make decisions at same time. As these generate new data, analyst---a business manager or an algorithm---also becomes generator. this article, we formulate problem as a Markov decision (MDP) show that interaction generates type bias---reinforcement bias---that exacerbates endogeneity in static analysis. When are independent identically distributed, embed instrumental variable (IV) stochastic gradient descent algorithm correct bias. For general MDP problems, propose class IV-based reinforcement learning (RL) algorithms We establish asymptotic properties by incorporating them into two-timescale approximation (SA). Our formulation requires unbounded state space more importantly, Markovian noise. Therefore, techniques RL SA literature, which rely boundedness martingale-difference structure noise, do not apply. develop finite-time risk bounds, bounds trajectory stability, distribution IV-RL algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mendelian randomization as an instrumental variable approach to causal inference.

In epidemiological research, the causal effect of a modifiable phenotype or exposure on a disease is often of public health interest. Randomized controlled trials to investigate this effect are not always possible and inferences based on observational data can be confounded. However, if we know of a gene closely linked to the phenotype without direct effect on the disease, it can often be reaso...

متن کامل

Variable Impedance Control - A Reinforcement Learning Approach

One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not t...

متن کامل

Complier-average causal effects for multivariate outcomes: an instrumental variable approach with application to health economics

In randomised controlled trials that have non-compliance with the treatment assigned, policy makers require unbiased estimates of the causal effect of the treatment received. Instrumental variable (IV) approaches provide complier average causal effects (CACE) estimates. Common IV methods such as two-stage least squares (2SLS) have not been extended to settings with multivariate outcomes. We pro...

متن کامل

Identification of causal relations in neuroimaging data with latent confounders: An instrumental variable approach

We consider the task of inferring causal relations in brain imaging data with latent confounders. Using a priori knowledge that randomized experimental conditions cannot be effects of brain activity, we derive statistical conditions that are sufficient for establishing a causal relation between two neural processes, even in the presence of latent confounders. We provide an algorithm to test the...

متن کامل

A Causal Approach to Hierarchical Decomposition in Reinforcement Learning

A CAUSAL APPROACH TO HIERARCHICAL DECOMPOSITION IN REINFORCEMENT LEARNING

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Social Science Research Network

سال: 2021

ISSN: ['1556-5068']

DOI: https://doi.org/10.2139/ssrn.3792824